Airborne LiDAR point cloud classification using PointNet++ network with full neighborhood features

Compared with other point clouds, the airborne LiDAR point cloud has its own characteristics. The deep learning network PointNet++ ignores the inherent properties of airborne LiDAR point, and the classification precision is low. Therefore, we propose a framework based on the PointNet++ network. In this work, we proposed an interpolation method that uses adaptive elevation weight to make full use of the objects in the airborne LiDAR point, which exhibits discrepancies in elevation distributions. The class-balanced loss function is used for the uneven density distribution of point cloud data. Moreover, the relationship between a point and its neighbours is captured, densely connecting point pairs in multiscale regions and adding centroid features to learn contextual information. Experiments are conducted on the Vaihingen 3D semantic labelling benchmark dataset and GML(B) benchmark dataset. The experiments show that the proposed method, which has additional contextual information and makes full use of the airborne LiDAR point cloud properties to support classification, achieves high accuracy and can be widely used in airborne LiDAR point classification.


Introduction
The airborne light detection and ranging (LiDAR) system provides a new technical approach for acquiring 3D spatial data, which provide reliable depth information. Airborne Laser Scanning (ALS) is one of the most import techniques for data collection for real-world scenes. The airborne laser scanning point cloud has been utilised in various fields, such as surveying and mapping, forestry survey [1], Underwater Navigation [2], 3D object detecting [3,4], remote sensing [5], computer vision [6], disaster monitoring and cultural heritage protection [7].
The airborne LiDAR point has the following characteristics: (1) The categories in the airborne LiDAR point scenes have evident geometric properties. (2) The objects have extreme scale variations. (3) The objects have discrepancy distribution along the elevation [8]. Capturing high-resolution or fine-grained features for ALS point cloud classification is difficult because the geometric attributes are similar [9]. The early works for point cloud data processing focused on designing various handcrafted point descriptors derived from the surrounding neighbourhood [10,11]. These point descriptors include density, roughness and curvature [12,13]. The machine learning method is commonly used to achieve point cloud classification based on various handcrafted descriptors and various methods, e.g. support vector machine [14], random forest and AdaBoost [15]. Beside, Some probabilistic graphical models are used to take advantage of the contextual information [16,17]. However, these methods heavily rely on handcrafted features and have limited generalisability for large-scale wild scenes. Nevertheless, the performance of the traditional method still heavily relies on the representation ability of the handcrafted features.
The deep learning method has recently achieved remarkable performance in scene classification, object detection and change detection [18]. Many researchers used deep learningbased methods to solve the problem for ALS point cloud classification. Some researchers projected row point clouds into 2D images [19] and then used convolutional neural networks (CNNs) for ALS classification to make full use of the advantages of CNNs [20]. The voxelbased method is another technique for ALS point cloud classification [21]. However, these projection methods usually require handcrafted features to enhance features and image representations, and the transformation from 3D to 2D inevitably cause information loss. In recent years, some studies have directly consumed raw point clouds and achieved state-of-the-art performance on some benchmarks.
The PointNet network [22] is a pioneering work that directly processes an irregular point. After the success of PointNet, many PointNet-like network architectures based on learned pointwise features have been proposed, such as PointNet++ [23], PointSift [24], PointCNN [25], PointWeb [26], DANCE-NET [27] and D-FCN [28]. Beside, the Graph-based method are develop rapidly [29,30], such as LDGCNN [31] and Dynamic Graph CNN (DGCNN) [32]. ALS system has its own characteristics, including uneven point density, inexplicit structure and high redundancy. However, the works mentioned above have limitations and do not maximise the use of geometric structures. Thus, they lack the perception of geometry in encoded semantic features.
PointNet++ performs well on many tasks. However, its usage in large-scale airborne point clouds is not good. In PointNet++ network, the simple partitioning process cannot effectively capture a complicated relationship [33]. The recent advances in deep learning for point cloud processing are mainly focused on the designs of local aggregation operators. How to collect the information in the local area in the PointNet++ network still needs to be discussed.
In this study, we proposed a network modified from the PointNet++ network according to ALS point cloud characteristic. The main contributions of this study are shown as follows: 1. The proposed network can be trained in an end-to-end manner. The feature on the centroid point and the neighbour relationship in the sample layer on the PointNet++ network is added to improve the representation ability to solve the uneven distribution problem.
2. This study also utilises elevation information as weight on the up-sampling layer to support classification for the ALS point cloud, which exhibits discrepancies in elevation distributions.
3. The class-balanced term is used on the 3D point cloud to solve the highly uneven category distribution problem.

Feature learning in local region
The PointNet++ network processes irregular point clouds directly, it is highly robust to small perturbations and occlusion. Thus, the need for expensive manual feature computation is eliminated, and a new solution for the 3D scene process is provided. The core of the PointNet ++ network is the sampling layer, grouping layer and PointNet layer. The PointNet++ network uses FPS to choose point part on the sampling layer, the KNN method to partition the grouping layer and the PointNet to collect features on the PointNet layer. Pointnet++ treats individual points in local point sets independently. This approach lacks the perception of the whole neighbourhood structure. The feature on the centroid in the Point-Net++ network is assigned by the neighbourhood points. This data processing method possesses the following drawbacks: (1) Directly replacing the centre point feature with the neighbourhood feature makes the selection centroid point very critical. (2) Information interaction amongst neighbours' points is lacking.
One point is not isolated to other points, and the attributes are formed by plenty of points [34]. Many features exist in the local region: neighbourhood features, centre point features and relationship features between neighbourhood points. In this chapter, the PointNet++ network is modified to enrich the local neighbourhood information description. Moreover, the centroid point information, neighbourhood point relationship and normalised coordinate information are combined. The formula for feature learning is shown as follows: where Δp ij is the normalised coordinate information, f 0 i is the centroid i information, and F jj 0 is the relationship between neighbour point j and point j 0 . The F jj 0 value can be obtained in many ways. The simplest way involves the maximum and minimum sum and average. However, this method destroys the module structure. Thus, we use the adaptive method to learn contextual information in local regions.
where W jj 0 is the adaptive weight, which is calculated from the difference between the two feature vectors, i.e. features F j and F j 0 , after the MLP network. This adaptive weight uses convolution filters over neighbour point features and is shown as follows: The strategy to calculate adaptive weight W jj 0 is also problematic because W jj 0 is equal to zero when j = j 0 , indicating that the impact of F j on itself is zero. To increase more learned information, we set the impact of F j on itself is estimated by its own feature F j when j = j 0 .
The modified PointNet++ network has several abstraction layers and feature propagation layers with skip connections. The features on centroid and relationships amongst neighbours are also added. The architecture is shown in Fig 1.

Class-balanced loss function
The category distribution in ALS scene is highly uneven which can be deduced from the point number in each category. The training efficiency is low when most locations do not provide useful learning signals. The challenge of long-tailed training data can be alleviated using two strategies: resampling and reweighting. In resampling, the number of examples for the minor class is oversampling, and the major class is under-sampling. However, the resampling always introduces a large amount of duplicated samples. Thus, the model becomes susceptible to overfitting, and the training slows down. Reweighting the loss by inverse class frequency usually yields poor performance for highly imbalanced classes. This strategy has poor performance on large-scale datasets.
Class-balanced terms perform well on 2D images. However, their performance on ALS point cloud classification is unknown. The class-balanced loss function formula is shown as follows: The main contribution of the class-balanced loss function is the added weighting factor ð1 À bÞ=ð1 À b n y Þ, where n y is the number of samples in the ground truth class. When β = 0 corresponds to no reweighting, β!1 corresponds to reweighting by inverse class frequency. Therefore, adjusting the hyper-parameter β enables us to adjust the class-balanced term between no reweighting and reweighting via inverse class frequency.
The research shows that the class-balanced strategies perform remarkable improvements to existing commonly used loss functions, including sigmoid cross-entropy, softmax crossentropy and focal loss [31]. The class-balanced softmax cross-entropy loss is The class-balanced sigmoid cross-entropy loss is The class-balanced focal loss is The class-balanced term is designed to address imbalanced data training, and can be applied to various deep networks.

Adaptive elevation interpolation method
PointNet++ adopts a hierarchical propagation strategy with distance-based interpolation and across level skip links. The point feature propagation is achieved by interpolating feature values of N l points at the coordinates of the N l−1 points. The weight on PointNet++ is the inverse distance based on k nearest neighbours. The ALS point clouds are discrepancy distribution along the elevation. For example, the point elevation on the ground is the lowest, followed by low plants, roofs and trees. The elevation information can effectively help distinguish different objects.
In this part, we embed the elevation information to the interpolation function to improve the performance on ALS point cloud classification further. The weight coefficient of distance is calculated based on the elevation difference between two neighbour points.
The point i interpolated features f (l−1) (x) are calculated from the neighbouring j point feature f (l) (x), which is shown as follows: where w ij denotes the weight coefficient of distance, which is shown as follows: where d ij is the distance between points i and j, and w_z ij is the weight coefficient of elevation, which is shown as follows: where z ij is the elevation diffident, and � z is the average value of z ij . Then, the interpolated features are concatenated with skip linked point features from the subsampled features.

Experimental result and analysis
The experiments are conducted on two airborne LiDAR point datasets to evaluate the effectiveness of the proposed method. The first one is the Vaihingen 3D labeling benchmark dataset released by International Society for Photogrammetry and Remote Sensing (ISPRS). This dataset was acquired by an airborne The Vaihingen benchmark dataset contains three blocks. Scene (I) is used for training, and scene (II) is utilised for evaluation. The point data number for training is 753876, and the number for testing is 411722. The proportion in different categories is shown in Table 1. Table 1 shows that the proportion of the power line category accounts for only 0.07%, whereas the impervious surfaces category accounts for 25.70%. The proportions in different categories are extremely uneven. Training directly on this dataset can be misleading.
Following the evaluation metric of the ISPRS 3D benchmark dataset, we use three metrics to evaluate our method, i.e. F1 score, Overall Accuracy (OA) and mean intersection over union (MIoU). MIoU is used to measure the classification performance for all categories. F1 score is calculated from the precision and recall values. It is suitable for classification evaluation in uneven class distribution. The calculations of precision, recall, F1 score and MIoU are  formulated as (11)-(14): MIoU where TP, FN and FP are the true positives, false negatives and false positives, respectively. The average precision (AvgP), average recall (AvgR) and average F1 score (AvgF1) are also utilised.

Model training
We train our model on a single NVIDIA Tesla V100 GPU. The proposed method is implemented using the PyTorch framework. Given the limited GPU memory, the training scene is divided into small patches with regular blocks with a size of 40 m � 40 m in the horizontal direction. The empirical knowledge of the model design is from reference [14] and comparative experiments. The parameters on sampling number, batch size, decay rate, training epoch, learning rate and optimiser are set as 4096, 3, 0.007, 64, 0.001 and Adam. The parameters in the training are saved every five epochs. For convenience, the validation dataset is set in the same manner as the training dataset. The MIoU is calculated regularly to validate the model. The highest MIoU is regarded as the best model. During the model testing, scene (II) is also segmented into 40 � 40 blocks in the horizontal direction, with a stride of 20 m. Then, the blocks are input into the best model to test the performance of our proposed method. We also test the loss function, adaptive elevation, adding feature and generalisation ability to investigate whether the ALS point cloud processing strategy is feasible.

Test of the loss function
We first investigate the performance of our proposed method on the class-balanced term to select loss function. The ISPRS labelling dataset indicates that the point for each object category is quite different. Directly training on this unbalanced dataset may cause the classes with small numbers to become under fitted and misclassified. We use the modified loss function to address the issue mentioned by forcing our model to focus on the classes with few points. We experiment on the ISPRS dataset to investigate the performance of the class-balanced term and select the suitable function. The result is shown in Table 2. From the value on Table 2 about AvgP, AvgR, AvgF1 and OA. the cross-entropy is low, and the focal loss function obtains a satisfying classification result. This result demonstrates that CB-focal loss obtains a quite stable performance. The CB-focal loss incorporates a categoryspecific weight factor for focal loss to reweight the classification loss. The class-balanced term combined with focal loss can outperform sigmoid cross-entropy and softmax cross-entropy. Therefore, we adopt CB-focal loss as the loss function for the following experiments.

Test of adaptive elevation
The ALS point clouds are discrepancy distribution along the elevation. We investigate the effect of the adaptive elevation interpolation method by embedding the elevation information to the interpolation function to improve the performance on ALS point cloud classification. The weight coefficient of the distance is calculated based on the elevation difference between two neighbour points. The performance of this adaptive elevation is shown in Table 3.
The AvgF1 and the OA improve by 0.4% and 0.8%, respectively. This finding indicates that the elevation information can effectively help distinguish different objects.  Table 4. Table 4 shows that the ALS point cloud classification performance is improved as the features become rich, and our proposed method has the highest OA and AvgF1. Method (a) and method (b) are limited for the feature extraction in distinguishing the classes in urban scenes because of scene complexity, unstructured nature of 3D point clouds, high sensor noise and incompleteness.

Test of added information in the local region
Our proposed method can effectively recognise most of the objects with an OA of 0.835 and AvgF1 0.718 by adding the feature about centroid point and neighbour relationship. The object class F1 scores that are higher than 60% are 7 out of 9, indicating that the proposed method can capture valuable features. Although the ISPRS dataset has a small number of points on the power line category, our proposed method achieves satisfying performance with an F1 score of 77.6%. Fig 3 shows the classification result of our proposed method and the error map.   Table 5 shows the confusion matrix of the per-class accuracy of our proposed method.
The confusion matrix shows that our proposed method obtains a quite satisfying performance on the impervious surfaces and power line categories as indicated by the F1 score. the proposed method also obtains acceptable performance on the roof and impervious surfaces  Table 5. Classification confusion matrix of our proposed method. The evaluation metrics about precision, recall and F1 score of each class are reported. The numbers in the confusion matrix are normalised along each row. categories. The shrub category and many shrub points are misclassified as low vegetation and tree points from the confusion matrix. The possible reason is that the shrubs have topological structures and elevation distribution similar to low vegetation and trees.

Comparisons with other methods
We also compare our proposed method with other point-based models on the ISPRS 3D benchmark labelling. Table 6 lists the classification performance of our proposed method and all the methods compared. Table 6 shows that the PointNet network has the lowest OA. The possible reason is that the PointNet network lacks neighbour information, and it does not collect enough features to represent the complex features on ALS point clouds. PointNet++ is a deep hierarchical network that recursively applies a unit PointNet on each grounded local region and makes full use of neighbour information. Compared with the baseline model (PointNet++ network), the proposed method increases by 2.5% in OA and 5.2% in AvgF1, which shows that the modified strategies are feasible.
DGCNN uses the dynamic graph convolutional neural to collect information. The collected neighbour point for centroid is the same. PointSift and D-FCN are the orientation-aware point feature learning methods. However, these methods ignore the relationship amongst neighbour points. PointCNN with A-XCRF [35] model obtains a high AvgF1. However, this model utilises a postprocessing step to refine the classification results. Our proposed method does not involve any postprocessing techniques. Table 6 shows that the performance of our proposed method is better than the methods compared, as indicated by the AvgF1. Our proposed method also achieves remarkable higher performance on the powerline categories. The powerline category only takes a small proportion of the training data, which is difficult to predict, suggesting that the class-balanced term is feasible.

Validation of generalisation ability
The GML(B) dataset are also conduct to validate the generalisation ability of our model, which belong to GML dataset for the B part and was acquired by the airborne Leica ALTM 2050 system. All points in this dataset are annotated in four semantic categories, including ground, building, tree and low vegetation. Each point contains only coordinate features. We use x, y, z coordinates as the model inputs in our experiments. Moreover, the hyperparameters are the same as the hyperparameters for the experiments on the Vaihingen dataset, except for the size of the segmented region, which is set as 48 m in the horizontal direction for the point density is approximately 7 point/m 2 , which is large than ISPRS benchmark dataset at 4-8 point/m 2 . We also compare the performance of our proposed method with that of PointNet, PointNet+ + and the Random Forest on GML(B) benchmark labelling dataset. Table 7 lists the classification performance of our proposed method and all the methods compared. Table 7 shows that the ground, building and tree are well recognised. The worst classification result lies in the low vegetation. Our proposed method achieves the best performance in AvgF1. As for computing time comparison, PointNet network has the higher processing efficiency and the lowest precision, which indicates that lacking neighborhood features is not well to the recognition of objects. PointNet++ network still has low precision in large-scale complex scenes classification for this network still considers each point in the local region independently. Our proposed method has the lowest computational efficiency, which is mainly related to the time in extracting neighborhood point relationships and center point features. The result on Random Forest models is come from reference [36], which do not have compute time. However, the Random Forest model take each point's local geometry independently and ignore the spatial dependencies. Fig 4 shows the classification result of our proposed method and the error map.
In Fig 4, many points are classified correctly, and errors are mainly distributed over the object edges. The modified PointNet++ network makes full use of receptive field information at all levels on the ALS point classification task. Our model can produce an accurate classification for the majority of ALS point clouds.

Conclusion
This study proposes a modified PointNet++ network for airborne LiDAR point cloud classification based on their own characteristics, and verify the advantages of our proposed method through comprehensive ablation experiments on Vaihingen 3D semantic labelling benchmark dataset and the GML(B) dataset. Compared with the baseline model (PointNet++ network) from the Vaihingen datasets, the proposed method increases by 2.5% in OA and 5.2% in AvgF1, which shows that the modified strategies are feasible. Besides, the proposed method also achieves a new state-of-art performance for the power line category. The powerline category only takes a small proportion of the training data, which is difficult to predict, suggesting that the class-balanced term is feasible. The generalisation ability is verified by GML(B) dataset and our model can produce an accurate classification for the majority of ALS point clouds. The modified PointNet++ network makes full use of receptive field information at all levels and the elevation information can effectively help distinguish different objects on the ALS point classification task. Our method does not involve any postprocessing techniques and can operates unordered point sets with varying densities, which can be widely used in ALS point classification.